Emergence of Scale-Free Networks from Local Connectivity and Communication 

Trade-offs 
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We introduce a new mechanism of connectivity evolution in networks to account for the emergence 
of scale-free behavior. The mechanism works on a fixed set of nodes and promotes growth from a 
minimally connected initial topology by the addition of edges. A new edge is added between two 
nodes depending on the trade-off between a gain and a cost function of local connectivity and 
communication properties. We report on simulation results that indicate the appearance of power- 
law distributions of node degrees for selected parameter combinations. 
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The topology of large-scale complex networks such as 
the Internet and the WWW is in general not known. 
The study of such networks has then relied on modeling 
them as random graphs |l| , and particularly on focusing 
almost exclusively on the distribution of node degrees. 
Unlike the classic case pioneered by Erdos and Renyi Q , 
for the Internet, the WWW, and several other networks, 
it appears that degrees are distributed according to a 
power law, not to a Poisson distribution. That is, the 
probability that a randomly chosen node has degree k is 
proportional to fc~ r , in general with 2 < r < 3 @, 0|. 

These findings are based on probe samplers in the case 
of both the Internet and the WWW @ , and are gen- 
erally regarded as reasonably accurate. However, part 
of the underlying machinery has been recently proven 
somewhat unreliable in the case of the Internet. For ex- 
ample, it has been demonstrated experimentally that the 
usual mechanism of inferring breadth-first-search trees 
from the probe results can underestimate the value of 
r significantly when the graph does have a power-law 
degree distribution Q. Likewise, it is possible to argue 
formally that such a mechanism can in some cases lead to 
the conclusion of a power-law degree distribution when 
in fact the graph's degrees are distributed in some other 
way @. 

In recent years, and notwithstanding these limitations, 
considerable effort has been put into discovering mech- 
anisms of network growth that give rise to a power-law 
degree distribution. Especially noteworthy is the mecha- 
nism of preferential attachment, which underlies the so- 
called Barabasi- Albert model as we ^ as variations 
[ill H2I H3L fl4| and generalizations thereof. Prefer- 
ential attachment is the policy whereby a new edge is 
added to the network between a new node and a pre- 
existing one with probability proportional to how many 
edges are already incident on the pre-existing node, that 
is, its current degree. The generalization of [l£j incorpo- 



rates both this policy and also the copying mechanism 
of We refer the reader to 17j for a review of the 

essential mathematical results related to these models. 

While the study of complex networks from the per- 
spective of node-degree distributions seems sound and 
has given rise to important discoveries related to global 
properties, such as the nature and size of a network's 
connected components and diameter , explaining the 
formation of the network from the same perspective (e.g., 
by evoking preferential attachment) is unreasonable for 
at least two reasons. The first is that the addition of a 
particular edge most definitely does not depend on global 
properties such as the distribution of node degrees at the 
time of expansion. The second reason is that, at least for 
computer networks like the Internet, it makes no sense 
to assume that the degree distribution, rather than some 
cost- or performance-related entity, is the essential driv- 
ing force behind the evolution of the network's topology. 

Models that depend on node-degree distributions are 
then adequate descriptive models, in the sense that they 
give rise to the desired power-law functional form, but 
constitute poor generative models. This has also been 
recognized elsewhere (cf., e.g., 0,H3|)i alm nas resulted 
in the appearance of alternative models, such as the ones 
in These, however, are also dependent on global 

properties, such as one-to-all distances, and therefore 
seem implausible as well. 

We work on the premise that networks such as the 
Internet or the WWW, although fast-growing, appear not 
to acquire new nodes fast enough to impact their main 
topological properties significantly. Thus the model that 
we study is targeted at the evolution of the connectivity 
of computer networks, and promotes network growth on a 
fixed set of nodes by incrementally adding edges between 
nodes as the result of comparing a gain function and 
a cost function for each edge addition. If i and j are 
nodes not currently connected by an edge in the network, 
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the gain incurred when adding an edge between them 
depends only on the immediate neighborhoods of i and 
j and on the current distance between i and j. The cost 
of the addition, in turn, is also dependent solely upon 
i and j and seeks to reflect both the cost of deploying 
the communications link itself and the cost of upgrading 
nodes i and j's connection capabilities to accommodate 
the new link. The edge joining i and j is added to the 
network if the gain surpasses the cost. 

We model the evolution of network connectivity as the 
sequence G°, G 1 , . . . of undirected graphs, all having the 
same set of n nodes. We assume that G° is a tree that 
spans all the nodes; G° is therefore connected and has n— 

1 edges. For t > 0, G t+1 is obtained from G* by randomly 
selecting two nodes, say i and j, that are not directly 
connected by an edge, and then adding an edge between 
them if the gain incurred with the addition of the edge 
is greater than its cost. Otherwise, we simply let G t+1 = 
G*. All graphs in the sequence arc then guaranteed to 
be connected and to remain free of multiple edges and 
self-loops. We let d\^ denote the distance between i and 
j in G*, and n\ the degree of node i in G*. We also let 
Nj(j) be the set comprising every neighbor k of node i 
in G* for which dj k > 2, and similarly Nfj be the set of 
unordered node pairs (/c, I) such that either k or I is a 
neighbor of i, the other node in the pair is a neighbor of 
j, and furthermore d kl > 3. 

Let g\j denote the gain incurred with the addition of 
an edge between i and j to G* when d\^ > 1. In our 
model, we let gjj be some upper bound on the number of 
edges by which distances between certain nodes become 
shorter after the addition of that edge. The distances 
we consider to establish this upper bound are some of 
those that involve i or j directly, or yet nodes in their 
immediate neighborhoods in G . Specifically, we consider 
djj, d\ k for k £ Nj(i) (neighbors of j that are more than 

2 edges away from i in G'), c?'- fe for k £ Nf(j) (neighbors 
of i that are more than 2 edges away from j in G*), and 
finally d l kl for (fc, I) £ (node pairs that are more than 

3 edges away from each other in G* , one being a neighbor 
of i, the other a neighbor of j). 

Upper bounds on each distance in the latter three 
groups are, clearly, dlj+1, and djj+2, respectively. 

An upper bound on the sum of all distances considered 
is then 

4 + (4 + l)|iV*(z)| + (4 + l)|A^(j)| + (4 + 2)|A^|, (1) 

where we use \X\ to denote the cardinality of set X. The 
addition of an edge joining i and j causes the sum of all 
these distances to become 



One crucial aspect of the gain expressed in is that, 
in the context of computer networks, it depends exclu- 
sively on information that can be obtained by tracing 
routes on G*. This certainly holds for the determina- 
tion of d\p and holds also for determining the sets Nj(i), 
N*(j), and N-j, provided only that the process of tracing 
routes is controlled for constant depth. However, given 
the nature of routing algorithms such as those of the In- 
ternet , tracing a route between i and j on G* is only 
guaranteed to provide an upper bound on d\p which is 
nonetheless consonant with the expression in being 
itself an upper bound on total distance improvement. 

Besides , the decision regarding the addition of an 
edge between i and j depends also on the cost of this 
addition. We denote this cost by cjj and define it in such 
a way that both the cost of deploying a communications 
link and the cost of possibly upgrading the connection 
capabilities of i or j are taken into account. The former 
of these we denote by C and assume to be independent 
of t, i, or j. 

As for the latter of the two cost components, we assume 
that the number of connections a node can sustain at 
any time is at most \a z ~\ for some fixed a > 1 and some 
z £ {0, 1, . . .} that does not decrease as time elapses (we 
use \x \ to denote the least integer that is no less than x). 
If the degree of i or j in G is precisely such a maximum 
number of connections, then the cost of connecting i to 
j directly involves the cost of upgrading the connection 
capabilities of i or j, as the case may be, to [a z+ ™], 
where w is the least integer for which |~a z+,i '] > [a 2 ] . 
We further assume, for some fixed j3 > 1, that the cost 
of endowing the node with the capability of connecting 
to \a z ~\ other nodes is proportional to /3 Z . 

Let rifc denote the number of connections of some node 
k. We model the scenario in which the cost incurred with 
the upgrade of n& from \a z ~\ to [a 2 " 1 "™ 1 ] is proportional 
to (3 Z+W — (3 Z (only the cost difference is paid) and is fur- 
thermore amortized along the deployment of each new 
connection (as opposed to being paid in full when the 
\a z ~\ + 1st connection is deployed). If we let /(rife) be 
the cost portion to be incurred when the number of con- 
nections is fit and for simplicity disregard the fact that 
connections necessarily occur in discrete numbers, then 
it follows that 



f(n k ) dn k tx (3 Z+W - (3 Z (4) 



Consequently, 



l + 2|ATj(i)|+2|7V|(i)|+ 3\Nh\, 



(2) 



and consequently the overall number of edges by which 
the distances become shorter is at most 

9% = (4 -!)[! + \ N M + l^i)! + HW ■ ( 3 ) 



/(rife) oc n|° s ' 



(6) 



Setting a — (3 leads f(rik) to be constant with respect 
to 7ifc; setting a / /3 leads finu) to vary either directly 
(a < (3) or inversely (a > (3) with rik- 
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FIG. 1: (Color online) Average node-degree distributions for 
n = 512, 1024, D — 0.1, and 7 = 0.9. Values for r are between 
2.9 and 4.2 for the lower degrees, 2.7 and 3.0 for the higher 
degrees. 

We then have 

4- = c+i?[Kr + (n5r] (?) 

for some constant D and 7 = log Q (/3/a). An edge is 
added to G 1 between nodes i and j to yield G* +1 if 
.g^ > c\j. If not, then G t+1 = G l . By the nature of 
(0 and (Q), this decision involves only the distance be- 
tween i and j in G* , in addition to other quantities that 
depend exclusively on the surroundings of i and j within 
a constant radius in G 4 . It is then essentially a local 
decision. 

We have conducted computer simulations for selected 
combinations of the G, D, and 7 parameters. Each simu- 
lation starts with a randomly chosen instance of G° and 
proceeds through t = 3000n. A G° instance is generated 
on the n initially isolated nodes by progressively selecting 
node pairs at random and directly interconnecting them 
if no path exists between them; because G° is a tree, it is 
necessary and sufficient that n — 1 such interconnections 
be performed. 

At each step of a simulation the distance d\j must be 
calculated on G* . While on a real computer network such 
a distance (or an upper bound thereof) is readily avail- 
able from the network's routing structure (as noted ear- 
lier), calculating d\j seems to be asymptotically no easier 
than finding the distances between a given node and all 
others in G*. For connected graphs, this requires 0{m t ) 
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FIG. 2: (Color online) Average node-degree distributions for 
n = 512, 1024, C = 100, and 7 = 0.9. Values for r are 
between 2.5 and 2.9 for the lower degrees, 2.9 and 5.2 for the 
higher degrees. 

time [24| , where we use to* to denote the number of edges 
of G*. It is therefore a time-consuming procedure, and 
progressively more so as the simulation is carried on and 
the graph tends to become denser. The consequence of 
this for the present study is that the value of n is some- 
what limited, and so is the number of independent G° 
instances that can be used for statistical significance. 

Our results are shown in Figures H^Sl where, respec- 
tively, the value of each of G, D, and 7 is varied while 
the other two parameters remain fixed at a set of com- 
mon values (G = 100, D — 0.1, and 7 = 0.9). For each 
combination of the three parameters we show results for 
two values of n. As the figures indicate, our model for 
network growth does indeed give rise to a scale-free pat- 
tern of behavior in which a vast majority of the nodes has 
low degrees while a few high-degree nodes are nonetheless 
present. 

The figures also indicate, except for the highest 7 val- 
ues in our simulations (7 = 1.5, 1.9, cf. Figure|3J), that the 
node-degree distribution seems to settle at two distinct 
power-law regimes, one for node degrees below roughly 
100, the other for those above this threshold. While a 
definitive explanation of why this happens depends upon 
a more detailed analysis of the process whereby nodes 
acquire ever higher degrees, we conjecture that it may go 
along the following lines. 

In our model, nodes acquire higher degrees one unit 
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FIG. 3: (Color online) Average node-degree distributions for 
n = 512,1024, C = 100, and D = 0.1. Values for r are 
between 2.9 and 3.0 for the lower degrees, 2.1 and 2.9 for the 
higher degrees. 

at a time when two of them become directly connected 
to each other as a result of comparing the gain in © 
to the cost in J7J. As degrees become larger and the 
network denser, it also happens that distances between 
node pairs become shorter. The node sets whose cardi- 
nalities appear in @ tend, therefore, to become smaller. 
Together, these trends make it progressively harder for 
gains to surpass costs and for degrees to continue increas- 
ing. 

However, a few high-degree nodes do appear and the 
dynamics of network growth may occasionally consider 
joining two of them together. Because they have high 
degrees, it may happen that the node-set cardinalities in 
P|) become once again relatively non-negligible and a few 
high-degree node pairs do indeed become interconnected. 
Our results indicate that, if this is what happens, then its 
occurrence inaugurates a new power-law regime for the 
highest degrees. In this case, what we witness may be 
the emergence of some sort of hierarchical organization 
within the network, not unlike what happens with the 
Internet |23j |. which is inherently organized in just such a 
way (that a single power-law regime should be reported 
in topology measurements like those of may be due 
exclusively to the fact that they are constrained to within 
one single level of the hierarchy). 

We acknowledge partial support from CNPq, CAPES, 
FAPERJ BBP grants, and the PRONEX initiative under 
contract PRONEX/FAJER 26.171.176.2003. 
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